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Abstract 

In this work, we consider a generalized fault model that can be used to represent a wide range 
of failure scenarios, including correlated failures and non-uniform node reliabilities. This fault 
model is general in the sense that fault models studied in prior related work, such as /-total and 
/-local models, are special cases of the generalized fault model. Under the generalized fault 
model, we explore iterative approximate Byzantine consensus (IABC) algorithms in arbitrary 
directed networks. We prove a necessary and sufficient condition for the existence of IABC 
algorithms. The use of the generalized fault model helps to gain a better understanding of 
IABC algorithms. 
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1 Introduction 



Dolev et al. @] introduced the notion of approximate Byzantine consensus by relaxing the requirement 
of exact consensus |12| . The goal in approximate consensus is to allow the fault-free nodes to agree 
on values that are approximately equal to each other (and not necessarily exactly identical). In 
presence of Byzantine faults, while exact consensus is impossible in asynchronous systems |5J, 
approximate consensus is achievable (4]. The notion of approximate consensus is of interest 
in synchronous systems as well, since approximate consensus can be achieved using distributed 
algorithms that do not require complete knowledge of the network topology [1J. The rest of the 
discussion in this paper assumes a synchronous systems. 

The fault model assumed in much of the work on Byzantine consensus allows up to / Byzantine 
faulty nodes in the network. We will refer to this fault model as the "/-total" fault model lTl6l HOl 
|41[12|. In prior work, other fault models have been explored as well. For instance, in the "/-local" 
fault model, up to / neighbors of each node in the network may be faulty © El H6| , and in the 
/-fraction model 1 16 1, up to / fraction of the neighbors of each node may be faulty. In this paper, 
we consider a generalized fault model (to be described in the next section). The generalized fault 
model specifies a "fault domain", which is a collection of feasible fault sets (a similar fault model is 
recently presented in fH). For example, in a system consisting of four nodes, namely, nodes 1, 2, 3 
and 4, the fault domain could be specified as T = { {1}, {2,3,4} }. Thus, in this case, either node 1 
may be faulty, or any subset of nodes in {2, 3, 4} may be faulty. However, node 1 may not be faulty 
simultaneously with another node. The new fault model is general in the sense that the other fault 
models studied in the literature, such as /-total, /-local and /-fraction models, are special cases of 
the generalized fault model. 

Analysis of consensus under the generalized fault model offers some new insights into how 
the choice of the fault model affects algorithm design. In particular, we consider "iterative" algo- 
rithms for achieving approximate Byzantine consensus in synchronous point-to-point networks 
that are modeled by arbitrary directed graphs. The iterative approximate Byzantine consensus (IABC) 
algorithms of interest have the following properties, which we will soon state more formally: 

• Initial state of each node is equal to a real-valued input provided to that node. 

• Validity condition: After each iteration of an IABC algorithm, the state of each fault-free node 
must remain in the convex hull of the states of the fault-free nodes at the end of the previous 
iteration. 

• Convergence condition: For any e > 0, after a sufficiently large number of iterations, the states 
of the fault-free nodes are guaranteed to be within e of each other. 

This paper is a generalization of our recent work on IABC algorithms under the /-total fault 
model (T41IT31 . The contributions of this paper are as follows: 

• We identify a necessary condition on the communication graph for the existence of a correct 
IABC algorithm under the generalized fault model (Sections |3] and H}. 

• We introduce a new IABC algorithm for the generalized fault model (Section [5} that uses 
only "local" information. 

• A transition matrix representation of the new IABC algorithm is presented (Section^. This 
representation is then used to prove the correctness of the proposed algorithm (Section |63|). 
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Since the results here generalize our prior results lfl4l[T3l , naturally the proof techniques used here 
have some similarities to the prior work. The material in Section l631 bears the strongest similarity 
to our prior work. The rest of the paper, however, presents results that provide new intuition on 
the problem of approximate consensus. In particular, materials in Sections 0] and |5] shed light on 
how the fault model influences the design of I ABC algorithms. 

2 Models 

Communication Model: The system is assumed to be synchronous. The communication network 
is modeled as a simple directed graph G( r V,£), where *V = {1, . . . ,n] is the set of n nodes, and £ 
is the set of directed edges between the nodes in r V. We assume that n > 2, since the consensus 
problem for n = 1 is trivial. Node i can reliably transmit messages to node j if and only if the 
directed edge (i, j) is in £. Each node can send messages to itself as well, however, for convenience, 
we exclude self-loops from set £. That is, (i, i) £ £ for i e *V. With a slight abuse of terminology, 
we will use the terms edge and link interchangeably in our presentation. 

For each node i, let Nr be the set of nodes from which i has incoming edges. That is, N~ = 
{ j | (j, i) € £ }. Similarly, define N + as the set of nodes to which node i has outgoing edges. That 
is, N? = { j | £ £ }. Nodes in Nr and N ; + are, respectively, said to be incoming and outgoing 
neighbors of node i. Since we exclude self -loops from £, i NT and i i N + . However, we note 
again that each node can indeed send messages to itself. 

Generalized Byzantine Failure Model: We consider the Byzantine failure model, with possible 
faulty nodes specified using a "fault domain" T (defined below). A faulty node may misbehave 
arbitrarily. Possible misbehavior includes transmitting incorrect and mismatching (or inconsistent) 
messages to different neighbors. The faulty nodes may collaborate with each other. Moreover, 
the faulty nodes are assumed to have a complete knowledge of the execution of the algorithm, 
including the states of all the nodes, the algorithm specification, and the network topology. 

The generalized fault model is characterized using fault domain T c 2^ as follows: Nodes in 
set F may fail during an execution of the algorithm only if there exists set F* e f such that F c F*. 
Set F is then said to be a feasible fault set. 

Definition 1 Set F c <y is said to be a feasible fault set, if there exists F* ef such that F c F*. 



Thus, each set in T specifies nodes that may all potentially fail during a single execution of the 
algorithm (a similar fault model is also considered in [91). This feature can be used to capture the 
notion of correlated failures. For example, consider a system consisting of four nodes, namely, 
nodes 1, 2, 3, and 4. Suppose that 

r = {{l},{2},{3,4}} 

This definition of f implies that during an execution either (i) node 1 may fail, or (ii) node 2 may 
fail, or (iii) any subset of {3, 4} may fail, and no other combination of nodes may fail (e.g., nodes 1 
and 3 cannot both fail in a single execution). In this case, the reason that the set {3, 4} is in the fault 
domain may be that the failures of nodes 3 and 4 are correlated. 

The generalized fault model is also useful to capture variations in node reliability. For instance, 
in the above example, nodes 1 and 2 may be more reliable than nodes 3 and 4. Therefore, while 
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simultaneous failure of nodes 3 and 4 may occur, simultaneous failure of nodes 1 and 2 is less 
likely. Therefore, {1,2} g T. 

Local knowledge ojT: To implement our IABC Algorithm presented in Section |5j it is sufficient for 
each node i to know Nr n F, for each feasible fault set F. In other words, each node only needs to 
know the set of its incoming neighbors that may fail simultaneously. Thus, the iterative algorithm 
can be implemented using only "local" information regarding f. 

3 Iterative Approximate Byzantine Consensus (IABC) Algorithms 

In this section, we describe the structure of the IABC algorithms of interest, and state the validity 
and convergence conditions that they must satisfy. 

Each node i maintains state V{, with Vj[t] denoting the state of node i at the end of the f-th 
iteration of the algorithm. Initial state of node i, Oj[0], is equal to the initial input provided to node 
i. At the start of the f-th iteration (t > 0), the state of node i is i?;[f - 1]. The IABC algorithms of 
interest will require each node i to perform the following three steps in iteration t where t > 0. 
Note that the faulty nodes may deviate from this specification. 

1. Transmit step: Transmit current state, namely Vi[t — 1], on all outgoing edges and self-loop (to 
nodes in N? and node i itself). 

2. Receive step: Receive values on all incoming edges and self-loop (from nodes in N~ and itself). 
Denote by r,[f] the vector of values received by node i from its incoming neighbors and itself. 
The size of vector r,[f] is \N7\ + 1. 

3. Update step: Node i updates its state using a transition function Z, as follows. Zj is a part of 
the specification of the algorithm, and takes the vector r,[f] as the input. 

Vi[t) = Zi(n[t]) (i) 

The following conditions must be satisfied by an IABC algorithm when the set of faulty nodes (in 
a given execution) is F: 

• Validity: Vf > 0, and all fault-free nodes ie<V -F, 

Vj[t] > min ;e ^/_ F Vj[t - 1] and Vj[t] < max ;e ^/_ F Vj[t - 1]0 

• Convergence: for all faidt-free nodes i,j e *V - F, limbec (y ; [f] - Vj[t]) - 

An IABC algorithm is said to be correct if it satisfies the above validity and convergence 
conditions in the given graph G(*y, £). For a given fault domain T for graph G( r V, £), the objective 
here is to identify the necessary and sufficient conditions for the existence of a correct IABC 
algorithm. 

4 Necessary Condition 

In this section, we develop a necessary condition for the existence of a correct IABC algorithm. 
The necessary condition will be proved to be also sufficient in Section|6l 

1 For sets X and Y,X-Y contains elements that are in X but not in Y. That is, X - Y = [i \ i e X, i g Y). 
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4.1 Preliminaries 



To facilitate the statement of the necessary condition, we first introduce the notions of "source 
component" and "reduced graph" using the following three definitions. 

Definition 2 Graph Decomposition: Let H be a directed graph. Partition graph H into strongly 
connected components, H\,H2, ■ ■ ■ ,H^, where h is a non-zero integer dependent on graph H, such that 

• every pair of nodes within the same strongly connected component has directed paths in H to each 
other, and 

• for each pair of nodes, say i and j, that belong to two different strongly connected components, either 
i does not have a directed path to j in H, or j does not have a directed path to i in H. 

Construct a graph H d wherein each strongly connected component above is represented by vertex cjt, and 
there is an edge from vertex c^ to vertex c\ if and only if the nodes in have directed paths in H to the nodes 
in H}. H d is called the decomposition graph ofH. 



It is known that for any directed graph H, the corresponding decomposition graph H is a directed 
acyclic graph (DAG) Q. 

Definition 3 Source Component: Let Hbea directed graph, and let H d be its decomposition graph as per 
Definition^ Strongly connected component Hk ofH is said to be a source component if the corresponding 
vertex c^ in H d is not_ reachable from any other vertex in H d . 

Definition 4 Reduced Graph: For a given graph G( r V / £) and a feasible fault set F, a reduced graph 
Gf(*Vp,Gp) is obtained as follows: 

• Node set is obtained as <Vf = r V-F. 

• For each node i e 'Vf, a feasible fault set F x (i) is chosen, and then the edge set &p is obtained as follows: 

- remove from £ all the links incident on the nodes in F, and 

- for each i e 'Vf and each j e F x (i) n 'Vf n NT, remove link (j, i)from £. 

Feasible fault sets F x (i) and F x (j) chosen for i + j may or may not be identical. 

Note that for a given G( r V, £) and a given F, multiple reduced graphs Gf may exist, depending 
on the choice of F x sets above. 

4.2 Necessary Condition 

For a correct IABC algorithm to exist, the network graph G( r V, £) must satisfy the necessary 
condition stated in Theorem [T]below. 

Theorem 1 Suppose that a correct IABC algorithm exists for G("V,£). Then, any reduced graph Gf, 
corresponding to any feasible fault set F, must contain exactly one source component. 
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Proof Sketch: A complete proof is presented in Appendix [Al The proof is by contradiction. Let 
us assume that a correct IABC algorithm exists, and for some feasible fault set F, and feasible sets 
F x (i) for each i e *V — F, the resulting reduced graph contains two source components. Let L and 
R denote the nodes in the two source components, respectively. Thus, L and R are disjoint and 
non-empty. Let C = CV — F — L — R)be the remaining nodes in the reduced graph. C may or may 
not be non-empty. Assume that the nodes in F (if non-empty) are all faulty, and all the nodes in 
L, R, and C (if non-empty) are fault-free. Suppose that each node in L has initial input equal to 
m, each node in R has initial input equal to M, where M > m, and each node in C has an input 
in the range [m,M]. As elaborated in Appendix [Al the faulty nodes can behave in such a manner 
that, in each iteration, nodes in L and R are forced to maintain their updated state equal to m and 
M, respectively, so as to satisfy the validity condition. This ensures that, no matter how many 
iterations are performed, the convergence condition cannot be satisfied. □ 



5 Algorithm 1 

We will prove that there exists an IABC algorithm - particularly Algorithm 1 below - that satisfies 
the validity and convergence conditions provided that the graph G(*V, £) satisfies the necessary 
condition in Theorem[TJ This implies that the necessary condition in Theorem[T]is also sufficient. 
Algorithm 1 has the three-step structure described in Section|3j This algorithm is a generalization - 
to accommodate the generalized fault model - of iterative algorithms that were analyzed in prior 
work (H [THIZUll], including in our own prior work as well fl4l [131 . The key difference from 
previous algorithms is in the Update step below. 
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Algorithm 1 



1. Transmit step: Transmit current state Vi[t - 1] on all outgoing edges and self-loop. 

2. Receive step: Receive values on all incoming edges and self-loop. These values form vector 
r,[£] of size \N~\ + 1 (including the value from node i itself). When a fault-free node expects to 
receive a message from an incoming neighbor but does not receive the message, the message 
value is assumed to be equal to some default value. 

3. Update step: Sort the values in r ; [f] in an increasing order (breaking ties arbitrarily). Let D be 
a vector of nodes arranged in an order "consistent" with r,[f]: specifically, D(l) is the node 
that sent the smallest value in r,[f], D(2) is the node that sent the second smallest value in 
r,[f], and so on. The size of vector D is also |N~| + 1. 

From vector r f -[f], eliminate the smallest fa values, and the largest fa values, where fa and fa 
are defined as follows: 

• fa is the largest number such that there exists a feasible fault set F' c N~ containing 
nodes D(l), D(2), D(fa). Recall that z g Nr. 

• fa is the largest number such that there exists a feasible fault set F" c Nr containing 
nodes D(|Nr| - fa + 2), D(\N7\ -fa + 3), D(\N7\ + 1). 

F' and F" above may or may not be identical. 

Let N*[t] denote the set of nodes from whom the remaining |Nr| + 1 - fa — fa values in r,[£] 
were received, and let Wj denote the value received from node / € N*[t]. Note that i e N*[t]. 
Hence, for convenience, define W[ — Vj[t — 1] to be the value node i "receives" from itself. 
Observe that if / 6 N*[t] is fault-free, then Wj = Vj[t - 1]. 

Define 

Vilt] = ZMt]) = Yj a ' w i C2) 

jeN'.[t] 

where 

- 1 _ 1 
tti ~ \N*[t]\ ~ \N7\ + l-fa-f 2 

The "weight" of each term on the right-hand side of 10 is a,, and these weights add to 1. 
Also, < fl; < 1. Although fa, fa and fl, may be different for each iteration t, for simplicity, we 
do not explicitly represent this dependence on t in the notations. 



Observe fa + fa nodes whose values are eliminated in the Update step above are all in Nr. Thus, 
the above algorithm can be implemented by node i if it knows which of its incoming neighbors 
may fail simultaneously; node i does not need to know the entire fault domain f as such. 

The main difference between the above algorithm and IABC algorithms in prior work is in the 
choice of the values eliminated from vector r, [f] in the Update step. The manner in which the values 
are eliminated ensures that the values received from nodes D(fa + 1) and D(|N~| - fa + 1) (i.e., the 
smallest and largest values that survive in r,[f]) are within the convex hull of the state of fault-free 
nodes, even if nodes D(fa + 1) and D(|N~| - fa + 1) may not be fault-free. This property is useful in 
proving algorithm correctness (as discussed below). 
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6 Sufficiency 



We will show that Algorithm 1 satisfies validity and convergence conditions, provided that G(*V, £) 
satisfies the condition below, which matches the necessary condition stated in Theorem [U 

Sufficient condition: Any reduced graph Gp corresponding to any feasible fault set F contains exactly one 
source component. 

In the rest of this section, we assume that G(fV, T) satisfies the above condition. To prove its 
sufficiency, we first develop a transition matrix representation of the Update step in Algorithm 1. 

6.1 Transition Matrix Representation 

In our discussion below, M[f] is a square matrix, M;[£] is the z'-th row of the matrix, and Mj/[f] is 
the element at the intersection of the z'-th row and /-th column of M[f]. 

For a given execution of Algorithm 1, let F denote the actual set of faulty nodes in that execution. 
Let \F\ - ip. Without loss of generality, suppose that nodes 1 through (n - ip) are fault-free, and if 
ip > 0, nodes (n — ip + I) through n are faulty. Denote by v[0] the column vector consisting of the 
initial states of all the fault-free nodes. Denote by v[t], where t > 1, the column vector consisting 
of the states of all the fault-free nodes at the end of the t-th iteration. The z'-th element of vector v[t ] 
is state Vj[t]. The size of vector v[t] is (n — ip). 

We will show that the iterative update of the state of a fault-free node i (1 < i < n — Tp) performed 
in (0 in Algorithm 1 can be expressed using the matrix form below. 

vM = MM v[t - 1] (3) 

where M/[f] is a stochastic row vector of size n - ip. That is, Mj/[f] > 0, for 1 < / < n — ip, and 
Y,i<j< n -ip = lH By "stacking" (O for different z, 1 < i < n - ip, we will represent the Update 

step of Algorithm 1 at all the fault-free nodes together using |4]) below. 

v[t] = M[t] v[t - 1] (4) 

where M[f] is a (n — ip) X (n - ip) row stochastic matrix, with its z'-th row being equal to M,[f] in (O. 
M[f] is said to be a transition matrix . 

In the rest of this section, we will first "construct" a transition matrix M.[t] that satisfies certain 
desirable properties. Then, we will identify a connection between the transition matrix and the 
sufficiency condition stated above, and use this connection to establish convergence property for 
Algorithm 1. The validity property also follows from the transition matrix representation. 

6.2 Construction of the Transition Matrix 

We will construct a transition matrix with the property described in Lemma [l] below. 

Lemma 1 The Update step of Algorithm 1 at the fault-free nodes can be expressed using row stochastic 
transition matrix M[f], such that there exists a feasible fault set F x (i) for each i e'V -F such that, for all 
/€{z'}U((T F -Fx(0)nNr), 

2 In addition to t, the row vector M,[f] may depend on the state vector v[t - 1] as well as the behavior of the faulty 
nodes in F. For simplicity the notation M;[f] does not explicitly represent this dependence. 
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where j6 is a constant (to be defined later), and < jS < 1. 

In ||T3ll as well, we construct a transition matrix to prove correctness of an IABC algorithm under 
the /-total fault model. However, the generalized fault model introduces additional complexity, 
which is handled here using a new approach to construct the transition matrix. 

Proof: We prove the correctness of Lemma [T]by constructing M,[f] for 1 < i < n - \p that satisfies 
the conditions in Lemma [TJ Recall that F is the set of faulty nodes, and |F| = ip. As stated before, 
without loss of generality, nodes 1 through n - xp are assumed to be fault-free, and the remaining 
nodes faulty. 

Consider a fault-free node i performing the Update step in Algorithm 1. In the Update step, 
recall that the smallest f\ and the largest ji values are eliminated from r,-[t], where the choice of f\ 
and fi is described in Algorithm 1. Let us denote by S and X, respectively, the set of nodea^ from 
whom the smallest f\ and the largest fi values were received by node i in iteration t. Define sets 
Sg and Xg to be subsets of S and X that contain all the fault-free nodes in S and X, respectively. 
That is, S g = Sn(<V-F) and Xg = X n (<V - F). 

Construction of M;[f] differs somewhat depending on whether sets S g ,£.g and N*[t] n F are 
empty or non-empty. We divide the possibilities into 6 separate cases. Due to space limitation, 
here we present the construction for one of the cases (named Case I). The construction for the 
remaining cases is presented in Appendix |Bl 

In Case I, S g t O, Xg + O, and N*[t] n F + O. Let nig and ni£ be defined as shown below. Recall 
that the nodes in S g and Xg are all fault-free, and therefore, for any node j € S g L> Xg, Wj = Vj[t - 1] 
(in the notation of Algorithm 1). 

LjeS t Vj[t-l] L je£g Vj[t-l] 

ms = Tc~i and m -c = T7~i 

\£>g\ \-Lg\ 

Now, consider any node k e N*[t]. By the definition of sets S g and Xg, m$ < w\ < ni£. Therefore, 
we can find weights > and > such that S^ + L^- 1, and 



w k = S k m s + L k m £ (5) 




Clearly, at least one of S k and L k must be > 1/2. We now define elements M;y[f] of row M,[f]: 

• For j € N*[t] n (*V — F) : In this case, / is either a fault-free incoming neighbor of i, or i itself. 
For each such j, define M,,[f] = a v This is obtained by observing in (O that the contribution 
of such a node ; to the new state Vj[t] is a, Wj = a, Vj[t - 1]. 

The elements of Mj[t] defined here add up to 

\N;[t]n(^V-F)\0i 

3 Although S and X may be different for each t, for simplicity, we do not explicitly represent this dependence on t in 
the notations S and 
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For € S g U Xg : In this case, ; is a fault-free node in S or X. 
For each / € S g , 



teN?[f]nf 



and for each node j e Xg 



keNMnF 



To obtain these two expressions, we represent value w k sent by each faulty node k in N*[t], 
i.e., k € N*[f] n F, using (6). Recall that this node k contributes a\w^ to (J2)). The above two 
expressions are then obtained by summing (|6]) over all the faulty nodes in N*[t] n F, and 
replacing this sum by equivalent contributions by nodes in S g and Xg. 

The elements of M, [f] defined here add up to 

at Yj (S k + L k ) = \N*[t]nF\ ai . 

/csN*[f]nF 

• For e (*V - F) - (N*[f] U 3g U Xg) : These fault-free nodes have not yet been considered 
above. For each such node j, define My[f] = 0. 

With the above definition of M,[£], it should be easy to see that M,[£] v[t — 1] is, in fact, identical 
to v, [t] obtained using ([2]). Thus, the above construction of Mj[t] results in the contribution of the 
faulty nodes in N*[t] to (0 being replaced by an equivalent contribution from fault-free nodes in 
Xg and S g . 

Properties of M,[f]: First, we show that M[£] is row stochastic. Observe that all the elements of 
M,-[£] are non-negative. Also, all the elements of M,[£] above add up to 

\N*[t] n (*V - F)| m + \N*[t] n F| ai = \N*[t]\ ai = 1 

because a, = 1/|N*[£]| as defined in Algorithm 1. Thus, M, [£] is a stochastic row vector. 

Recall that from the above discussion, for k € N*[t], one of Sjt and L k must be > 1/2. Without 
loss of generality, assume that S s > 1/2 for some s e N*[t] n F. Consequently, for each node j e S g , 
M,y[f] > |^-[S S > jra-j. Also, for each fault-free node ; in N*[t], Mjj[t] = a^. Thus, if j6 is chosen such 
that 

and F x (z) is defined to be equal to X, then the condition in the lemma holds for node i. That is, 
Mij[t] > p for ; e {z} u (Cy F - F x (0) n N"). 

All Cases Together: Using similar constructions in other cases as well (presented in Appendix 
H2) and a suitable choice of $ (presented in Appendix |C|, we can obtain a row stochastic matrix 
M[f], and for each i 6 7-f identify a feasible fault set F x (z'), such that Mjj[t] > j6 for all ; e 
{i} U (CVf - Fx(i)) n N'T"). Thus, Lemma[l]can be proved correct. 

□ 
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6.3 Validity and Convergence of Algorithm 1 

The rest of the proof structure is derived from our previous work wherein we proved the correctness 
of an IABC algorithm for the /-total fault model [ 13 1 . Let Rp denote the set of all the reduced graphs 
of G( r V / £) corresponding to a feasible fault set F. Let x = \Rp\. x depends on F and the underlying 
network, and is finite. 

In this discussion, let us denote a reduced graph by an italic upper case letter, and the cor- 
responding "connectivity matrix" (defined below) using the same letter in boldface upper case. 
Thus, H denotes the connectivity matrix for graph H £ Rp. 

Non-zero elements of connectivity matrix H are defined as follows: (i) for 1 < i, j < n - xp, 
Hij = 1 if and only if (j, i) e H, and (ii) H,, — 1 for 1 < i < n — i/>. That is, non-zero elements of row 
H, correspond to the incoming links at node i, and the self-loop at node i. Thus, the connectivity 
matrix for any reduced graph in Rp has a non-zero diagonal. 

Based on the sufficient condition stated at the start of Section[6]and Lemma[l[ we can show the 
following key lemmas. The proofs are presented in Appendix ID1 and El 

Lemma 2 For any H e Rp, H n ~^ has at least one non-zero column. 

Lemma 3 For any t > 1, there exists a graph H eRp such that f}H < M[t]. 

Theorem 2 Suppose that G(fV, £) satisfies the sufficient condition stated above. Algorithm 1 satisfies both 
the validity and convergence conditions. 

Proof: A complete proof is presented in Appendix |F] By repeated application of (ID, we can 
represent the Update step of Algorithm 1 at the f-th iterations (f > 1) as: 

v[t] = (n* =1 M[/])i;[0] (8) 

where M[z] is constructed as described above. When presenting matrix products, for convenience 
of presentation, we adopt the following convention: for a < b, Yl b _ A[i] denotes the "backward" 
product A[fc]A[fc - 1] • • • A[fl]. Thus, n! =1 M[z'] in © above represents M[f]M[f - 1] • • • M[l]. 

Since M[z] is row stochastic, then from ((U), it follows that Algorithm 1 satisfies the validity 
condition. Based on Lemmas|2]and|3l we can also show that the rows of LT =1 M[z'] become identical 
in the limit (as elaborated in Appendix[F]|. This observation and © together imply that the states 
of the fault-free nodes satisfy the convergence condition too. □ 

7 Conclusions 

This paper considers a generalized fault model, which can be used to specify more complex failure 
patterns, such as correlated failures or non-uniform node reliabilities. Under this fault model, 
we prove a tight necessary and sufficient condition for the existence of synchronous iterative ap- 
proximate Byzantine consensus algorithms in arbitrary directed graphs. The analysis of consensus 
under the generalized fault model sheds new light on how the fault model affects algorithm design. 
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APPENDIX 



A Necessity Proof in Section H] 

Now, we present the proof for Theorem [TJ The proof is by contradiction. Let us assume that a 
correct I ABC algorithm exists, and for some feasible fault set F, and feasible sets F x (i) for each 
i € *V — F, the resulting reduced graph contains two source components. 

Let L and R denote the nodes in the two source components, respectively. Thus, L and R are 
disjoint and non-empty. Let C = i^V — F - L — R) be the remaining nodes in the reduced graph. C 
may or may not be non-empty. Let us now assume that the nodes in F (if non-empty) are all faulty, 
and all the nodes in L, R, and C (if non-empty) are fault-free. 

Consider the case when (i) each node in L has initial input m , (ii) each node in R has initial 
input M, such that M > m, and (iii) each node in C (if non-empty) has an input in the interval 
[m,M]. 

In the Transmit step of iteration 1 of the IABC algorithm, suppose that the faulty nodes in F 
(if non-empty) send m~ < m on outgoing links to nodes in L, send M + > M on outgoing links 
to nodes in R, and send some arbitrary value in interval [m,M] on outgoing links to nodes in 
C (if non-empty). This behavior is possible since nodes in F are Byzantine faulty. Note that 
m~ <m < M < M + . Each fault-free node k e *V -F sends to nodes in N£ value z^[0] in iteration 1. 

Consider any node i e L. Since L is a source component in the reduced graph, it must be true 
that Nr n (C u R) c N~ n F x (i) n *Vf3 

Now, node i receives m~ from the nodes in Nr n F, and values in [m, M] from the nodes in 
Nr n (C U R), and m from the nodes in {z} U (Nr n L). Figure [T] illustrates the behavior of faulty 
nodes in F and the value received by node i. 

Consider the following two cases: 

• Nr r\F and Nr n(CuR) are both non-empty: In this case, (Nr nf)cf and Nr n (C U 
R) = Nr n F x (z) n 'Vp c F x (z). From node z's perspective, consider two possible scenarios: 

(a) nodes in Nr n F are all faulty, and the other nodes are fault-free, and (b) nodes in 
Nr n(CuR) = N~ n F x (i) n ^V F are all faulty, and the other nodes are fault-free. Note that, 
since F x (i) is a feasible fault set, Nr n F x (i) n *Vf is also a feasible fault set. Similarly, since F 
is a feasible fault set, N~ n F is also a feasible fault set. 

In scenario (a), from node z's perspective, the fault-free nodes have sent values in interval 
[m,M], whereas the faulty incoming neighbors, i.e., nodes in Nr n F, have sent value m~ . 
According to the validity condition, y,[l] > m. On the other hand, in scenario (b), the fault- 
free incoming neighbors have sent values m~ and m, where m~ < m; so Vj[l] < m, according 
to the validity condition. Since node z does not know whether the correct scenario is (a) or 

(b) , it must update its state to satisfy the validity condition in both cases. Thus, it follows 
that = m. 

4 Explanation: In the reduced graph, there are no incoming links at z from nodes in Nr n (C U R). Thus, any incoming 
links in £ from the nodes in Nr n (C U R) must have been removed when constructing £ F for the reduced graph. Recall 
that when constructing £ f , incoming links from nodes in N7 n F x (i) n *Vf are removed. It should be noted that the 
algorithm is performed using the links in £, not the reduced graph. Thus, in the Transmit step, all links in £ are used. 
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Figure 1: Illustration of the behavior of faulty nodes in F and the value received at node i. 



• At most one of Nr n F and Nr n(CUR) is non-empty: Recall that Nr n F and Nr n(CUR) = 
Nr n Fx(i) n 'Vj ar e both feasible fault sets. Since at least one of these two sets is empty, their 
union, i.e., (Nr n F) U (AT n (C U R)), is also a feasible fault set. 

Then, from node i's perspective, it is possible that all the nodes in (Nr nF)U (Nr n(CU R)) 
are faulty, and the rest of the nodes are fault-free. In this situation, the values sent to node i 
by the fault-free nodes (which are all in {/} U (Nr n L)) are all m, and therefore, must be 
set to m as per the validity condition. 

Hence, = m for each node i e L. Similarly, we can show that Vj[l] — M for each node j e R. 

Now consider the nodes in set C (if non-empty). All the values received by the nodes in C are 
in [m, M], therefore, their new state must also remain in [m, M], as per the validity condition. 

The above discussion implies that, at the end of iteration 1, the following conditions hold true: 
(i) state of each node in L is m , (ii) state of each node in R is M, and (iii) state of each node in C (if 
non-empty) is in the interval [m,M]. These conditions are identical to the initial conditions listed 
previously. Then, by a repeated application of the above argument (proof by induction), it follows 
that for any t > 0, Vi[t] - m for all nodes i 6 L, Vj[t] - M for all nodes j eR and Vy\t\ e [m,M] for all 
nodes k e C. 

Since L and R both contain fault-free nodes, and m + M, the convergence requirement is not 
satisfied. This is a contradiction to the assumption that a correct iterative algorithm exists in 
G(«V,£). 
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B Construction for other Cases in Section 



When discussing Case I in Section |6^2l we deferred discussion of the other cases. We present the 
construction for the rest of the cases here. There are six cases in total: 

• Case I: S g + O, £ g * O, and N*[t] nF#$. 

• Case II: S g ±®,£ g * G>, and N*[t] n F = O. 

• Case III: S g = O, £ g + O, and N*[t] n F + O. 

• Case IV: * O, £ g = O, and NT[t] n F + O. 

• Case V: S g = O, £ g = O, and N?[f] n F * O. 

• Case VI: at most one of S g and £, g is non-empty, and N*[t] n F = O. 

Note that the choice of /i and /2 in Algorithm 1 ensures that the value from node i itself is never 
dropped from r,[f]; therefore, z 6 N*[f], and N*[f] is always non-empty. 

B.l Case II 

Now, we consider the case when S g O, £, g ± O, and N*[t ] n F = O. That is, when each of S and X 
contains at least one fault-free node, and N*[t] contains only fault-free node(s). In fact, the analysis 
of Case II is very similar to the analysis presented in Section l6!2l for Case I when N*[t] does contain 
a faulty node. 

We now discuss how the analysis of Case I can be applied to Case II. Rewrite (O as follows: 



Vi[t] = -jvi[t - 1] + jVi[t - 1] + ^ aiWj 

/eNJ[i]-{i} 

= diW z + UiWi + ^ OiWj 



(9) 
(10) 



In the above equation, z is to be viewed as a "virtual" incoming neighbor of node i, which 
has sent value w z = Vi ~ to node i in iteration t. With the above rewriting of state update, the 

value received by node i from itself should be viewed as Wj = v '^ 2 ^ instead of Vi\t — 1]. With this 
transformation, Case II now becomes identical to Case I, with virtual node z being treated as an 
incoming neighbor of node i. 

In essence, a part of node i's contribution (half, to be precise) is now replaced by equivalent 
contribution by nodes in £ g and S g . We now define elements My[f] of row M;[f]: 

• For j — i: M.ij[t] - %. This is obtained by observing in (O that node i's contribution to the 
new state Vj[t] is a,- p ^ • 
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For j € N*[t] - {/} : In this case, j is a fault-free incoming neighbor of i. For each such j, define 
M;;[f] = fl,. This is obtained by observing in 10 that the contribution of node j to the new 
state V{[t] is aiiVj - UjVj[t - 1]. 

For ; 6 S g U Xg : In this case, is a fault-free node in S or X. 
For each ; € S g , 

Mij[t] = ^ A 



and for each node j € Xg, 



2 \S g 



M//I/I = % U 



2 |X, 



where S 2 and L 2 are chosen such that S 2 + L 2 = 1 and a? z = P '^ 2 ~ 1 ^ = |n?5 + ym^. Note 
that such S 2 and L z exist because by definition of S g and Xg, — 1] > Wj, V; 6 Sg and 
Vi[t — 1] < Wj, V/' 6 Lg. Then the two expressions above are obtained by replacing the 
contribution of the virtual node z by an equivalent contribution by the nodes in S g and £ g , 
respectively. 

For e (T 7 - F) - (N*[t] U S g U Xg) : These fault-free nodes have not yet been considered 
above. For each such node define M;y[t] = 0. 



By argument similar to that in Section [672l M.[t] is row stochastic. Without loss of generality, 
suppose that S 2 > 1/2. Then for each node e S g , M,y[t] = 2j§~fS 2 > |j^-r. Also, for fault-free node 

in N*[t] - {i}, M !; [f] = a ir and M„[f] = j. Recall that by definition, |«Sg| > 1. Hence, if jS is chosen 
such that 



and F x (i) is defined to be equal to X, then the condition in the Lemma [TJ holds for node i. That is, 
Mij[t] > p for ; € [i] U CV F - F x (i)) n Nr. 



B.2 Cases III and IV 



Now, we describe the construction of Case III. The construction for Case IV is very similar, and 
thus, is omitted here. 

In Case III, S g = 0,Xg * ®, and N*[t] n F + O. Thus, S does not contain any fault-free 
nodes (hence S g is empty). This may be due to one of the following two reasons: (i) the set S is 
non-empty, but all the nodes in S are faulty, or (ii) set S is empty. 

Assume that / £ X is a fault-free node, and that all the nodes in S are faulty (i.e., S g — O) or that 
S is empty (i.e., f\ = 0). In this case, observe that node D(/i + 1) must be fault-free (otherwise, f\ 
cannot be the largest value as defined in Algorithm 1). Now, consider any node k e N*[t]. Similar 
to the argument in Case I, we can find weights > and > such that 

S k + L k = l 
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and 

w k = S k v D{fl+1) [t - 1] +L k vi[t - 1] (12) 
We now define M;/[f] for all fault-free /. 

• For ;' e (N*[t] - {D(/i + 1)}) n («V - F). That is, ;' is a fault-free node in N*[t] with the exception 
ofD(/i + l). 

For each such /, define M,y[f] = This is obtained by observing in $2$ that the contribution 
of node to the new state Vj[t] is ajWj = a, Vj[t - 1]. 

The elements of M, [£] defined here (including the case of j = i) add up to 

(\mt]nev-F)\-i) of. 



For nodes D(/i + 1) and Z: Define 



M !D (/ 1+ i)[f] = + ^ a; S* 

fceN?[t]nf 



and 



M !7 [f] = ^ a,- Ljt 

fceN*[f]nF 

Similar to Case I presented in Section [6T2l these two expressions are obtained by summing 
up the contribution over the faulty nodes in N*[t], and replacing the sum by an equivalent 
contribution by the nodes D(fi + 1) and I, respectively according to ((T2|) . 

The above elements of M, [f] add up to 

( \ 



1+ ( S k + Lk) 



= (l + \hT i [t]nF\)a i . 



• For j € ("V - F) - (N*[t] U {/}): These fault-free nodes have not yet been considered above. For 
each such define MM] - 0. 

Similar to Case I 7 in Case III as well, it should be easy to see that 

M;[f] v[t - 1] 

is identical to V{[t] obtained using ((2]). 



Properties of M,[f]: All the elements of M,[f] are non-negative. The elements of M,[£] defined in 
Case II add up to 

(|n;m n {<V - F)\ - 1) m + (1 + \N*[t] n F\) at = \N*[t]\ ai = 1 

Thus, M,[i] is a stochastic row vector. 

In Case III, recall that for any fault-free node / in N*[t] (including j - D(f\ + 1) and j - i), 
M,,[f] > a,-. Thus, if jS is chosen such that 

< /3 < ai (13) 
and F x (i) is defined to be equal to X, then the condition in the Lemma [T] holds for node i. 
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B.3 Case V 



Consider Case V, where N*[t] n F ± O, and S s - £ g = O. In this case, it should be easy to see 
that N*[f] contains at least 3 nodes. In particular, Df 1+ i must be fault-free (otherwise, f\ cannot 
be maximum possible), D^ T \-f 2+ \ must be fault-free (otherwise, cannot be maximum possible), 
and there is a faulty node in N*[t]. 

Now this case can be handled similar to Case III analyzed above. In particular, entries in M,[f] 
are defined similarly with / being defined equal to D N -_f 2+1 . Also, define F x (i) - O. 

Hence, it is easy to see that the properties of M,[f] are identical to Case III presented above. 



B.4 Case VI 

Here, we consider the case when at most one of S and X contains a fault-free node and N*[t] nF = O. 
Without loss of generality, suppose that S contains only faulty nodes, and X may contain a fault-free 
node. 

In this case, define M;y[f] = a, for j e N*[t]; define M,; = for all other fault-free nodes ;'. Also, 
define F x (i) = £. 

The properties of M,[£] thus defined are identical to Case III above. 



C Putting Cases Together 



Now, let us consider Cases I-VI together. From the definition of a ; in Algorithm 1, observe that 
Oi > jj^pj+T (because fi,fi ^ 0). Let us define 



1 

a - mm ■ 



iev \Nr\ + 1 

Moreover, observe that |*Sg| < n and \£ g \ <n. Then define |S as 

c-s (14) 

This definition satisfies constraints on j6 in Cases I through VI (conditions 10, ((IT) and (fl3l). Thus, 
Lemma [T] holds for all six cases with this choice of ft. 



D Proof of Lemma |2] in Section 163 

Here, we present the proof of the first key lemma used in the sufficiency proof. 
Lemma|2] For any H € Rf , H n ~^ has at least one non-zero column. 
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Proof: G( r V / £) satisfies the sufficient condition stated at the start of Section [6] Therefore, there 
exists at least one non-faulty node k in the reduced graph H that has directed paths to all the nodes 
in H (consisting of the edges in H). Since the length of the path from k to any other node in H is at 
most n — ip — 1, the fc-th column of matrix H n ~^ will be non-zero|l □ 



E Proof of Lemma [3] in S ection 163 



Here, we present the proof of the second key lemma used in the sufficiency proof. We start with 
two definitions: 

Definition 5 For matrices A and B of identical size, and a scalar y, yB < A provided that yBjj < Aijfor 
all i, j. 

We want to prove the following lemma. 



Lemmal For any t > 1, there exists a graph H eRf such that p"H < M[t]. 



Proof: Observe that the z-th row of the transition matrix M[£] corresponds to the state update 
(in Algorithm 1) performed at fault-free node i. Recall from Lemma [T] that M,y[f] > j5 for j € 
{i} U (CV F - F x (i)) n NT), where F x {i) is a feasible fault set. 

Let us obtain a reduced graph H by choosing F x (i) for each i as defined in Lemma[TJ Then from 
the definition of connectivity matrix H, Lemma |3] then follows. □ 



F Correctness of Algorithm 1 

When presenting matrix products, for convenience of presentation, we adopt the following con- 
vention: for a < b, n._ A[i] denotes the "backward" product A[b]A[b - 1] • • • A[a]. 

The proof below is similar to a proof for the /-total fault model in our previous work [13|. It is 
included here for the convenience of the referees. 



F.l Matrix Preliminaries 

In the discussion below, we use boldface upper case letters to denote matrices, rows of matrices, 
and their elements. For instance, H denotes a matrix, H, denotes the z-th row of matrix H, and H;y 
denotes the element at the intersection of the z-th row and the ;-th column of matrix H. 

Definition 6 A vector is said to be stochastic if all the elements of the vector are non-negative, and the 
elements add up to 1. A matrix is said to be row stochastic if each row of the matrix is a stochastic vector. 

5 That is, all the elements of the column will be non-zero. Also, such a non-zero column will exist in h""''' -1 , too. We 
use the loose bound of n - ip to simplify the presentation. 
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For a row stochastic matrix A, coefficients of ergodicity 6(A) and A(A) are defined as follows 

m. 

6(A) - max max | A; ; - A, 2 ,| 

7 h,h 



A (A) = 1 - min V min(A il y , A 2 - 2 ; ) 



; 

It is easy to show that < 6(A) < 1 and < A(A) < 1, and that the rows of A are all identical if and 
only if 6(A) = 0. Also, A(A) = if and only if 6(A) = 0. 

The next result from [6| establishes a relation between the coefficient of ergodicity 6(-) of a 
product of row stochastic matrices, and the coefficients of ergodicity A(-) of the individual matrices 
defining the product. 

Lemma 4 For any p square row stochastic matrices Q(l), Q(2), . . . Q(p), 

6(Q(p)Q(p-l)---Q(l)) < nf =1 A(Q(0). 

Lemma[4]is proved in [6]. It implies that if, for all i, A(Q(z')) < 1 — y for some y, where < y < 1, 
then 6(Q(p)Q(p - 1) • • • Q(l)) will approach zero as p approaches oo. We now define a scrambling 
matrix H61IT511. 



Definition 7 A row stochastic matrix H is said to be a scrambling matrix if A(H) < 1. 

The following lemma follows easily from the above definition of A(- ). 

Lemma 5 If any column of a row stochastic matrix H contains only non-zero elements that are all lower 
bounded by some constant y, where < y < 1, then H is a scrambling matrix, and A(H) <l-y. 



F.2 Correctness of Algorithm 1 

Lemma 6 For any z > 1, in the product below ofH[t] matrices for consecutive z(n - \p) iterations, at least 
one column is non-zero. 

Proof: Since the above product consists of z(n - 1/>) connectivity matrices corresponding to graphs 
in Rf, at least one of the connectivity matrices corresponding to the z distinct graphs in R<f, say 
matrix H* , will appear in the above product at least n- ip times. 

Now observe that: (i) By Lemma IH H" ^ contains a non-zero column, say the £:-th column 
is non-zero, and (ii) all the H[t] matrices in the product contain a non-zero diagonal. These two 
observations together imply that the k-th column in the above product is non-zero. □ 



Let us now define a sequence of matrices Q(z)/ i ^ 1/ such that each of these matrices is a 
product of z(n — xp) of the M[£] matrices. Specifically, 

Q(0 = ^ZfM>» m l (is) 
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From © and ( fl5l) observe that 

v[kT(n-ip)] = (nf =1 Q(i)) v[0) 



(16) 



Lemma 7 For z > 1, Q(z) z's fl scrambling row stochastic matrix, and 

A(Q(z)) < l-^-W. 

Proof: Q(z) is a product of row stochastic matrices (M[£]); therefore, Q(z) is row stochastic. From 
Lemma [3} for each t > 1, 

]SH[f] < M[t] 

Therefore 

p« n -vri^.-fl t , U1 H[t] < ni^-J, , u1 M[t] = Q(i) 

r t=(i-l)T(j!-l/)) + l L J f=(!-l)T(f!-l/') + l L J ^ v 7 

By using z = (i — l)(n — i^) + 1 in Lemma[6l we conclude that the matrix product on the left side of 
the above inequality contains a non-zero column. Therefore, Q(z) on the right side of the inequality 
also contains a non-zero column. 

Observe that z(n — ip) is finite, and hence, j3 T (" _1 W is non-zero. Since the non-zero terms in H[f] 

matrices are all 1, the non-zero elements in II! f) , , x , Hffl must each be > 1. Therefore, there 

exists a non-zero column in Q(z) with all the elements in the column being > fi T ( n ~V). Therefore, by 
Lemma |5l A(Q(z')) < 1 - ^ n ~^\ and Q(z) is a scrambling matrix. □ 

Theorem[2] Suppose that G^V, S) satisfies the sufficient condition stated above. Algorithm 1 satisfies both 
the validity and convergence conditions. 



Proof: Since v[t] - M[f] v[t — 1], and M[f] is a row stochastic matrix, it follows that Algorithm 1 
satisfies the validity condition. 

Using Lemma[4]and the definition of Q(z), and using the inequalities A(M[f]) < 1 and A(Q(z')) < 
(1 - ^("-•W) < 1, we get 

limSfnUMM) = i™s((n; K1 ^ JW ,,_ w+1 M[,])(n^ J Q( 1 ))) 

< lim n^" J A(Q(i)) = 



Thus, the rows of n|_ 1 M[z] become identical in the limit. This observation, and the fact that 
v[t] = (n|_ 1 M[z])^[0] together imply that the states of the fault-free nodes satisfy the convergence 
condition. □ 



21 



